首页> 外文OA文献 >Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark
【2h】

Scalable Exact Parent Sets Identification in Bayesian Networks Learning with Apache Spark

机译:贝叶斯网络学习中可扩展的精确父集识别   使用apache spark

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In Machine Learning, the parent set identification problem is to find a setof random variables that best explain selected variable given the data and somepredefined scoring function. This problem is a critical component to structurelearning of Bayesian networks and Markov blankets discovery, and thus has manypractical applications, ranging from fraud detection to clinical decisionsupport. In this paper, we introduce a new distributed memory approach to theexact parent sets assignment problem. To achieve scalability, we derivetheoretical bounds to constraint the search space when MDL scoring function isused, and we reorganize the underlying dynamic programming such that thecomputational density is increased and fine-grain synchronization iseliminated. We then design efficient realization of our approach in the ApacheSpark platform. Through experimental results, we demonstrate that the methodmaintains strong scalability on a 500-core standalone Spark cluster, and it canbe used to efficiently process data sets with 70 variables, far beyond thereach of the currently available solutions.
机译:在机器学习中,父集标识问题是找到一组随机变量,这些变量可以最好地解释给定数据和某些预定义评分功能的所选变量。这个问题是贝叶斯网络和马尔可夫毯发现的结构学习的关键组成部分,因此具有许多实际应用,从欺诈检测到临床决策支持。在本文中,我们为精确的父集分配问题引入了一种新的分布式存储方法。为了实现可伸缩性,我们推导了使用MDL评分功能时限制搜索空间的理论界限,并重新组织了底层动态编程,从而提高了计算密度并消除了细粒度同步。然后,我们在ApacheSpark平台中设计我们方法的有效实现。通过实验结果,我们证明了该方法在500核独立Spark集群上保持强大的可伸缩性,并且可以有效地处理具有70个变量的数据集,远远超出了当前可用解决方案的范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号